Using automatically labelled examples to classify rhetorical relations: an assessment

نویسندگان

  • Caroline Sporleder
  • Alex Lascarides
چکیده

Being able to identify which rhetorical relations (e.g., contrast or explanation) hold between spans of text is important for many natural language processing applications. Using machine learning to obtain a classifier which can distinguish between different relations typically depends on the availability of manually labelled training data, which is very time-consuming to create. However, rhetorical relations are sometimes lexically marked, i.e., signalled by discourse markers (e.g., because, but, consequently etc.), and it has been suggested (Marcu and Echihabi, 2002) that the presence of these cues in some examples can be exploited to label them automatically with the corresponding relation. The discourse markers are then removed and the automatically labelled data are used to train a classifier to determine relations even when no discourse marker is present (based on other linguistic cues such as word co-occurrences). In this paper, we investigate empirically how feasible this approach is. In particular, we test whether automatically labelled, lexically marked examples are really suitable training material for classifiers that are then applied to unmarked examples. Our results suggest that training on this type of data may not be such a good strategy, as models trained in this way do not seem to generalise very well to unmarked data. Furthermore, we found some evidence that this behaviour is largely independent of the classifiers used and seems to lie in the data itself (e.g., marked and unmarked examples may be too dissimilar linguistically and removing unambiguous markers in the automatic labelling process may lead to a meaning shift in the examples).

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Exploiting Linguistic Cues to Classify Rhetorical Relations

We propose a method for automatically identifying rhetorical relations. We use supervised machine learning but exploit cue phrases to automatically extract and label training data. Our models draw on a variety of linguistic cues to distinguish between the relations. We show that these feature-rich models outperform the previously suggested bigram models by more than 20%, at least for small trai...

متن کامل

Using Prosody to Classify Discourse Relations

This work aims to explore the correlation between the discourse structure of a spoken monologue and its prosody by predicting discourse relations from different prosodic attributes. For this purpose, a corpus of semi-spontaneous monologues in English has been automatically annotated according to the Rhetorical Structure Theory, which models coherence in text via rhetorical relations. From corre...

متن کامل

Using Hedges to Classify Citations in Scientific Articles

Citations in scientific writing fulfil an important role in creating relationships among mutually relevant articles within a research field. These inter-article relationships reinforce the argumentation structure intrinsic to all scientific writing. Therefore, determining the nature of the exact relationship between a citing and cited paper requires an understanding of the rhetorical relations ...

متن کامل

Rhetorical Replica (Badal Bilaqi) and Its Variants in Hafez’s Sonnets

One of the stylistic features of Hafez’s sonnets is the repetition of a part of the meaning of the first line in the second line. His knowledge of the rhetorical and semantic relations of vocabularies enabled him to repeat the meaning with the least verbal repetition. One of the ways that has helped him to achieve this goal is replicating the concepts in two parts of the couplet based on rhetor...

متن کامل

Choosing Rhetorical Relations in Instructional Texts: the Case of Eeects and Guidances

In this paper, we address the problem of planning the textual organization of instructions. We take the view that natural language generation (NLG) is a mapping process of diierent levels of conceptual and textual representations. Within this framework , we consider the mapping between the text's semantic representation and its rhetorical structure. We argue that such a mapping is not direct, b...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Natural Language Engineering

دوره 14  شماره 

صفحات  -

تاریخ انتشار 2008